Comparative genomic analyses on assassin bug Rhynocoris fuscipes (Hemiptera: Reduviidae) reveal genetic bases governing the diet-shift

Summary Genetic basis underlying the biodiversity and phenotypic plasticity are fascinating questions in evolutionary biology. Such molecular diversity can be achieved at multi-omics levels. Here, we sequenced the first chromosome-level genome of assassin bug Rhynocoris fuscipes, a polyphagous generalist predator for biological control of agroecosystems. Compared to non-predatory true bugs Apolygus lucorum and Riptortus pedestris, the R. fuscipes-specific genes were enriched in diet-related genes (e.g., serine proteinase, cytochrome P450) which had higher expression level and more exons than non-diet genes. Extensive A-to-I RNA editing was identified in all three species and showed enrichment in genes associated with diet in R. fuscipes, diversifying the transcriptome. An extended analysis between five predaceous and 27 phytophagous hemipteran species revealed an expansion of diet-related genes in R. fuscipes. Our findings bridge the gap between genotype and phenotype, and also advance our understanding on genetic and epigenetic bases governing the diet shifts in ture bugs.


Supplementary Tables
Figure S1.Supporting information of R. fuscipes genome assembly, related to Figure 1.(A) GenomeScope profile with 17-mer.The fit of the model (black line) to the observed k-mer density plot.(B) Genome-wide Hi-C link heatmap of the final chromosome.Each scaffold refers to chromosome and each pixel refers to a 100-Kb bin.

Figure S3 .
Figure S3.GO and KEGG analysis of diet genes in R. fuscipes, related to Figure 2.These plots show the top 20 enriched terms from Gene Ontology (green) or pathways from KEGG (yellow) for three categories: R. fus-unique (A-B), R. fus-more (C-D) and R. fusspecific (E-F).The X-axis represents the number of enriched genes and the Y-axis represents the enriched terms and pathways.Significant enrichment is marked with asterisks: *, P < 0.05; **, P < 0.01.

Figure S4 .
Figure S4.mRNA length of different categories of genes, related to Figure 2. Left panel: Class1 (R. fus-specific) genes.Right panel: Class2 (shared) genes.P value was calculated by Wilcoxon rank sum test.

Figure S5 .
Figure S5.Genes with more exons tend to have more isoforms, related to Figure 2. (A) Spearman correlation between number of exons (X-axis) and number of transcript isoforms (Y-axis) annotated in the R. fuscipes reference genome.(B) From Iso-Seq results, the numbers of expressed transcript isoforms per gene increase with number of exons.

Figure S6 .
Figure S6.Distribution and editing levels of RNA editing sites in three species, related to Figure 3 and Figure 4. Upper panel: the fraction of editing sites in genomic repeat regions.Red dashed lines represent the baselines of genomic repeat content in each species.P values were calculated using Fisher's exact test comparing to the baseline.***, P < 0.001.Lower panel: editing levels of nonsynonymous and synonymous sites.P value was calculated by Wilcoxon rank sum test.

Figure S7 .
Figure S7.Synteny analysis of R. fuscipes and other hemipteran species, related to Figure 1 and the STAR Methods.Species were ordered by their phylogeny.

Figure S8 .
Figure S8.Comparison of intra-OG similarity and inter-OG similarity, related to the STAR Methods.P value was calculated by Wilcoxon rank sum test.

Figure S9 .
Figure S9.The proof or improvement of the reliability of A-to-I RNA editing sites in R. fuscipes, related to Figure 3 and the STAR Methods.(A) SNPs identified from the WGS data of R. fuscipes.(B) The fractions of A>G variations before and after the exclusion of potential SNPs.(C) A highly conserved RNA editing site in potassium channel gene Shab which is identified in Drosophila melanogaster (Diptera), Coridius chinensis (Hemiptera) and the three hemipteran species we used.(D) Linkage disequilibrium (LD) between SNPs or between RNA editing sites in the RNA-Seq data.

Table S7 . Information of protein domains collected from Pfam database, related to Figure 2.
: The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Note